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Abstract 

Sequential estimation of the success probability p in inverse binomial sam- 
pling is considered in this paper. For any estimator p, its quality is measured by 
the risk associated with normalized loss functions of linear-linear or inverse-linear 
form. These functions are possibly asymmetric, with arbitrary slope parameters 
a and b for p < p and p > p respectively. Interest in these functions is motivated 
by their significance and potential uses, which are briefly discussed. Estimators 
are given for which the risk has an asymptotic value as p — > 0, and which guar- 
antee that, for any p g (0, 1), the risk is lower than its asymptotic value. This 
allows selecting the required number of successes, r, to meet a prescribed quality 
irrespective of the unknown p. In addition, the proposed estimators are shown 
to be approximately minimax when a/6 does not deviate too much from 1, and 
asymptotically minimax as r — s> oo when a = b. 

Keywords: Sequential estimation, Point estimator, Inverse binomial sampling, 
Asymmetric loss function. 

1 Motivation and considered loss functions 

The estimation of the success probability p of a sequence of Bernoulli trials is a recur- 
ring problem, arising in many branches of science and engineering. The quality of a 
point estimator of p, denoted as p, can be measured in terms of its risk, or average loss 
associated with a certain loss function L. Since a given error is most meaningful when 
compared wit h the true value p, quality m easures used in practice are most often nor- 
malized ones ( Mendo fc Hernandol 12010a ) . Common loss functions include normalized 



square error (p/p — l) 2 and normalized absolute error \p/p — 1|. Int erval estimation 



can also be analyzed in terms of a certain loss function (|Bergeii Il985l p. 64) such that 
the resulting risk is the confidence level associated with an estimation interval. 

Fixed-sample approaches to this problem suffer from the drawback that the re- 
quired size depends on the unknown parameter p, and thus cannot be determined 
in advance. Therefore a sequential procedure is required, consisting in a stopping 
rule, which yields a random sample size, and an estimator based on the observed 
sample. A particularly appealing stopping rule is inverse binomial sampling. Given 
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r £ N, this rule consists in taking as many observations as necessary to obtain ex- 
ac tly r successes. The rando m number of observations, N, is a sufficient statistic for 
p (|Lehmann fe Casellal . [l99l p. 101). 

Inverse binomial sampling, first discussed by iHaldane ( 1945 ). has received a great 
deal of attention, motivated by the useful properties of the obtained estimators. Namely, 
it has been shown that for an estimator p = g(N) such that lim J i_>. 00 ng(n) exists 
and is positive, and for a general class of loss functions defined by certain regularit y 
conditions, the risk has an asymptotic value as p — > (jMendo fe Hernando! . 2010a). 
Moreover, estimators have been found whose risk for p arbitrary is guaranteed not 
to exceed its asymptotic value, for the specific cases of nor malized mean s quare error 
(Mi kulski fc Smith! . Il976l) , normalized mean absolute error (Mcndo, 



2009) and confi- 



dence associated with a relative interval (jMendo fc Hernando! . 120081 ) (jMendo fc Hernando 
2010bl ). This allows selecting an appropriate value of r that meets a prescribed risk 
irrespective of the unknown p. 

In all mentioned cases, the loss incurred by a negative error equals that of the cor- 
responding positive error. In practice, however, situation-s pecific factors may render 
unde re stimation more or less costly than overestimation (jChristoffersen fc Dieboldl . 
1997 ) ( Akdeniz . 2004h . Consider for example p = 0.01 and two possible values of p, 
namely 0.019 and 0.001. The absolute error (normalized or otherwise) is the same for 
both values of the estimator, as is the squared error. Nevertheless, with the first esti- 
mate p is 1.9 times p, whereas with the second p is 10 times p. In many applications it 
may be advisable to assign a higher loss to the second estimate. With absolute error, 
this could be accomplished by generalizing the loss function to one with a different 
slope on each side. Denoting x — p/p, this generalized loss is given by 



Ul-x) ifx<l, 
y ' \b(x-l) if x > 1, 



(1) 



with parameters a,b > 0, (a, b) ^ 0. This function, known as (normal iz ed) li near- 
linear loss, frequently ar i ses in applications; see for example iGranger (1969) and 



Christoffersen fc Dieboldl (jl997f ). Another proposed function (not considered in this 
paper) which gives different weights to positive and negative errors is the linear- 
exponential loss, whose normaliz ed version is L (x) = bexp(a(x — 1)) — a(x — 1) — 1, 
with parameters a ^ 0, b > ( Akdeniz . 2004) . The ratio a/6, in the linear-linear 
loss, or the parameter a, in the linear-exponential, control the relative importance 
given to underestimation and overestimation. Note that in both cases the loss due to 
underestimation is bounded, unlike that of overestimation, which may be arbitrarily 
large. 

In certain situations it may be meaningful to define loss as proportional to p/p or 
p/p, whichever is largest. Thus with the values in the previous example, the loss would 
be the same in both cases. In the following, the function s(p,p) — Taax{p/p,p/p} will 
be referred to as the symmetric ratio of p and p (the name is motivated by the fact 
that s(p,p) = s(p,p)). The loss thus defined is inherently normalized, because it only 
depends on p and p through x — p/p. Subtracting 1 in order to have a minimum loss 
equal to 0, the loss function is expressed as L(x) = maxja;, l/x} — 1. This function is 
unbounded for underestimation as well as for overestimation errors. In fact, its graph 
is symmetric about x = 1 if p (or x) is represented in logarithmic scale (this is obvious 
if L(x) is written as exp | loga;| — 1). The risk corresponding to this loss is the mean 
symmetric ratio minus 1, and represents a normalized measure of dissimilarity between 
p and p, with smaller values corresponding to better estimators. A generalization is 
obtained, as before, by allowing different multiplicative parameters a, b > 0, (a, b) ^ 
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on each side of the function: 



m = \f ,x - 1) (2) 

I b{x — 1) it x > 1. 

This will be referred to as inverse-linear loss. 

The loss function ([2]), in addition to providing a natural measure of estimation 
quality, namely generalized mean symmetric ratio, can be representative of incurred 
cost in specific applications. As an example, consider the production of a certain device 
which is subject to manufacturing defects, such as image sensors for digital cameras. 
Several factors in the production process (such as the presence of dust particles) may 
result in a sensor with specific pixels systematically showing incorrect information. 
Since it would be too expensive to discard all sensors that have some defect, the 
commonly adopted solution is as follows. Each produced sensor is tested, and if the 
number of defective pixels is not too large it is accepted. The location of such pixels 
is permanently recorded in the camera, so that they can be corrected as a part of the 
processing applied by the camera to generate the image. 

In high-quality camera models, however, it may be desirable to use sensors with 
an extremely low number of defects. A possible procedure is to classify each produced 
sensor as "premium" or "standard" , depending on whether the number of pixel defects 
is extremely low or merely acceptable. Premium sensors are reserved for advanced 
cameras, which incorporate high-quality lenses, whereas standard sensors are mounted 
in consumer-level cameras with average-quality lenses. For ease of exposition, these 
two types of lenses will also be referred to as premium and standard, respectively. The 
production of each type of lens is a more deterministic process than that of sensors, 
and thus the number of produced lenses of each type is easily controlled. 

It will be assumed that the manufacturer is primarily interested in its premium 
line of cameras. A number S of sensors is to be produced, and the amount of premium 
lenses that will be required needs to be planned in advance. To this end, an estimate 
p is made of the proportion p of sensors that will turn out to be of the premium type 
(this can be done using inverse binomial sampling); and Sp premium lenses are made 
available. The actual proportion of premium sensors, p, may be lower than p, in which 
case some of the premium lenses will be left unused; or it may be greater, and then 
some of the premium sensors will not be used. In either case, some resources are 
wasted. If the cost associated with each unused part is a for a sensor and b for a lens, 
the risk computed from the loss function ([2]) is the average cost of wasted resources 
per assembled premium camera unit. 

The rest of the paper analyzes inverse binomial sampling under the loss fu nctions 
H) an d ((2]). The first has already been analyzed for the particular case a = b bv lMendol 
2009), and the generalization to a ^ b will be seen to be rather straightforward. The 



second function has not been dealt with before, to the authors' knowledge, and its 
analysis turns out to be more difficult. Although the main focus of the paper is on the 
second, results for the first are also interesting by themselves. In each case, estimators 
are given in Section [2] such that the risk for p € (0, 1) is guaranteed to be lower than its 
asymptotic value. Section [3] discusses these results and makes a comparison with the 
optimum performance that could be achieved by using other estimators. It is shown 
that the proposed estimators are approximately minimax if a/b is close to 1; and for 
a = b they are asymptotically minimax as r — > oo. Section [4] contains the proofs to all 
results. 
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2 Main results 

Consider a sequence of Bernoulli trials with probability of success p, and a random 
stopping time N given by inverse binomial sampling with r G N. Let a;W denote 
x(x — 1) • • • (x — i + 1), for a; g R, i £ N; and x 1 - * 1 — 1. The normalized lower 
incomplete gamma function is defined as 

1 f u 

7(<,u) = TtfjJo st_lexp( ~ s)ds - (3) 

The random variable N has a negative binomial distribution, with probability function 
f r (n) = P[N = n] given by f r (n) = (n - l)( r ~ 1 )p r (l -p) n ~ r /{r - 1)!, n > r. The 
corresponding distribution function will be denoted as F r (n). Similarly, the probability 
function of a binomial random variable with parameters n and p is denoted as b n . p (i) = 
n Mp*(l — p) n— < i < r. For an arbitrary nonrandomized estimator p = g(N) 
and a loss function L(p/p), the risk rj(p) is 

oo 

V (p) = E[L(p/p)} = f r (n)L(g(n)/p). (4) 

n—r 

For r > 2 and a, 6 > 0, the l oss functions (p} and ([2|) satisfy the sufficient conditions 
of Mendo fc Hernando! ( 2010aL theorem 1), and thus any estimator p — g(N) with 
linin^oo ng(n) = Q > has an asymptotic risk as p — > 0, which can be computed as 

1 f 00 

lim77(p) = 7 —7/ ^ r_1 exp(-^)L(fi/^)d^. (5) 

In particular, this holds for any estimator which can be expressed as 

with 17 > 0, d > -r. 

Consider a generic estimator of the form ^ . Denoting m = [fl/p — d\, the risk 
associated with the loss function (JlJ can be written as 

' ,( ^^l 1 ( 1 -(^) w " ,+4 l(<^- 1 ) w "» (7) 



-a 

n—r 



Particularizing to = r — 1 an d d = — 1 , which yi e lds th e uniformly minimum variance 
unbiased (UMVU) estimator dMikulski fc Smithl . Il976l ). and taking into account the 
following identities (jMendol . l2009t) 

/ r -i(n-l)= (r ~ 1} {; (n) forr>2, n > r, (8) 

[n — l)p 

F r - 1 (n-l)=F r (n) + (l-p)b n - ljP (r-l) for r > 2, n > r, (9) 

it is seen that, for r > 2, the first summand in ([7]) becomes 0, and 

m 

r?(p) = (a + 6) 5^(/ P -!(n - 1) - f r (n)) = (a + b)(F r ^(m - 1) - F r (m)) 

n—r v u / 

= (a + 6)(l -p)6 m -i, p (r - 1). 
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The case a = b = 1 is analyzed in iMendol (|2009f ). Comparing jTO]) with iMendol (|2009l 
eq. (12)), the expression of the risk for a,b arbitrary is seen to be a straightforward 
generalization of that for a = b = 1. As a consequence, the following result holds. 

Theorem 1. Consider the loss function given by ([T]) with a, b > 0, (a, 6) ^ (0, 0). For 

r >2, the risk rj(p) associated with the estimator p = (r — l)/(iV — 1) satisfies 

rj(p) < lim ri(p) for any p £ (0,1), (11) 

with 

r ( v (a + 6)(r-ir- 2 cxp(-r + l) 

In addition, as will be seen in Section [31 in certain condi tions this estimator 
approa ches the asymptotically optimum estimator discussed in iMendo fc Hernando 
(l2010a|) . 

For the loss function ([2]) , the risk associated with an estimator of the form (J6j> can 
be decomposed in a similar way as for ([T]). Namely, rj{p) = r]i{p) + 772(f) with 

m(p) = a jr (^^-AfAnl (13) 

n=m-\-l 

Assuming d < in (fl3]) and taking into account that, according to (0), npf r (n) = 
rf r +i(n + 1), it follows that 

oo 

»?i(p)<a £ (^-l)/ r (n) = ^(l-F r+1 (m + l))+o(F r (m)-l) ) (15) 

n— m+l 

with strict inequality if d < 0. As for ^(p), assuming d > — 1, it stems from ((5J) and 
(fTlj) that 

m{p) < bJ2 ( (n _D " 1 ) ^ = tti^-i ( m " J ) " 6F -( m )' ( 16 ) 

with strict inequality if d > —1 and m > r. As a result of (fTS)) and p^)) . for any 
d G [—1,0] the risk satisfies 

b^l (XT f T \ 

r)(p) < — jFr-xtm - 1) + (a - b)F r (m) - -^F r+1 {m + 1) + a - lj , (17) 

with strict inequality if m > r or a > 0. The right-hand side of (fl7|) is greatly simplified 
if f2 is chosen as any value ft > which satisfies 

"'" bCl a-b, (18) 



n r-l 

for in that case, applying the identity ©, 

b^l (XT I T 

flip) < T (F r -x{m - 1) - F r (m)) + x (F r (m) - F r+1 (m + 1)) + a Ur - 1 

r-l Q \Q 



b^l (IT ( T 

= r(l -P)bm-i, p (r - 1) + -~-(l~p)b m . p (r) + a[-~- 

r-i n \n 



r 

1 

Q 



(19) 
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The advantage of this expression is that the terms (1— p)b m -i, p {i — 1) and (1— p)b mtP (r) 
lend themselves to analysis more easily than the distribution functions in (fl7|) . 

The condition (|18|) on (l has a single positive solution for a, b > 0, (a, b) ^ (0, 0), 
namely 




■ljj fora,o>0 

= <( r - 1 for a = 0, b > ( 20 ) 

for b = 0, a > 0. 



It is easily seen that this reduces to \/r(r — 1) for a = 6 > 0. In addition, the following 
holds. 

Proposition 1. The value ofQ given by (|20p lies in the interval (r — 1, r) /or a, 6 > 0. 

As a consequence of Proposition [TJ for any a,b > 0, (a,b) ^ (0,0), the value O 
defined by ([20")) satisfies f2 e [r— 1, r] . Taking into account that p umvu = (r— 1)/(N— 1) 
is the UMVU estimator and that p m i = r/JV is the maximum likelihood (ML) estimator 
( Best] . [l973) . the estimator p given by © with Q £ [r — 1, r] and de [— 1, 0] is seen to 



be a "reasonable" one, in the sense that it is "close" to the UMVU and ML estimators; 
in particular, taking d = Q — r implies that the inequality p U mvu < P < Pmi is a sure 
event (i.e. holds for each possible value of N). As will be seen in Section [3j in certain 
cases the pr oposed estimator is a l so clos e to the asymptotically optimum estimator in 
the sense of Mendo k, Hernandol ( 2010al) . 



The preceding arguments justify that the estimator given by ([6]) with Q € [r — 1, r] 
and d £ [— 1, 0] is worth considering. In fact, for adequate choices of f2 and d, it satisfies 
the important property that the risk is guaranteed not to exceed its asymptotic value, 
as established by the next theorem. 

Theorem 2. Consider the loss function given by ([2]) with a, b > 0, (a, b) ^ (0, 0). For 
r > 2, the estimator p = Q/(N + d) with fl given by (|20|) and d € [fi — r, 0] satisfies 

•q(p) < lim ri(p) for any p € (0,1), (21) 



with 



r \ ar 



3 Discussion and additional properties 

3.1 Significance of the results 

It has been shown in Section[2]that similar results to those already known for mean ab- 
solute error, mean square error and confidence level also hold for generalized mean ab- 
solute error (Theorem [T]) and generalized mean symmetric ratio (Theorem Specifi- 
cally, it has been proved that, for the proposed estimators, sup pg ( 01 ) rj{p) = lim p _>o v(p)- 
In the following, fj will denote the value of lim p _j.o v(p)i or equivalently sup pg ( t \ r](p), 
for the estimators in Theorems [1] and [21 

The importance of these results lies in the fact that no knowledge is required about 
p. Thus, given any desired value A for the risk, an adequate r can be selected such 
that the risk is guaranteed not to exceed A, irrespective of p. Namely, it suffices to 
choose r as the minimum value for which fj, computed from (|12p or from (|22|) . is less 



G 
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Figure 1 : Risk guaranteed not to be exceeded for inverse-linear loss © with a = b = 1 



than or equal to A. As an illustration, Figure Q] depicts fj as a function of r for the loss 
given by ^ with a = b = 1. It is seen, for example, that r = 75 suffices to guarantee 
a risk lower than 0.1, that is, a mean symmetric ratio lower than 1.1. 



3.2 Comparison with minimax estimators 

The presented results are valid under certain restrictions on the estimator. Namely, 
for L given by ([2]) the estimator has to be © with S7 = SI given by ([20]) and d 
restricted to a certain interval; for L as in (fTJ) both SI and d are fixed values. It 
is natural to ask to what extent the results could be improved by considering other 
estimators, i.e. how much lower risks could be guaranteed not to be exceeded (or 
equivalently how much sup pe / ^ r/(p) could be reduced). By definition, an estimator 
that is optimum according to this criterion (i.e. which minimizes sup pg ( rj(p) over 
all possible estimators), if it exists, is a minimax estimator. 



This question can be addressed on the basis of the analysis in 



Mcndo fc Hernando 



Mcndo &: Hernando 



(2010a). For a, b > 0, both (fj]) and ([2]) satisfy the assumptions of 
(|2010al theorem 3). This assures that there exists a value of SI, denoted as SI*, which 
minimizes limsupp^p v(p) over all estimators, including randomized ones. Thus any 
estimator p — g(N) with lim„_ i . 0O ng(n) — SI* is asymptotically optimum, in the 
sense of achieving the minimum possible limsup p ^. vip)- This minimum, which will 
be denoted as ij* , restricts the values A that the risk can be guaranteed not to be 
exceeded for p arbitrary. Namely, if an estimator guarantees that rj(p) < A for a given 
A, then necessarily A > 77*. As a consequence, the risk that is guaranteed not to be 
exceeded by the specific estimators considered in Section [2] is at most fj/rj* times larger 
than what could be achieved by a minimax estimator. 

The value 77* is obtained as follows. Consider the loss function ([T]) first. For SI 
arbitrary, ([5]) gives 



p->0 

Its derivative 



iim V (p) = (« + ft)"7(r -i,n) _ {a + bh{r> a) + a f x _ n , (23) 



— Iim r]{p) = * ^ '- 24 

dS2 p^o r — 1 
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Figure 2: Degradation factor 77/77* as a function of a/6 and r 



is seen to be monotone increasing. Therefore the minimizing value Q* is unique, and 
is determined by the condition d(limp_i.o ?7(p))/df2 = 0, that is, 

7 (r-l,n*) = -J-. (25) 
a + b 



The value f2* can be computed numerically using (|25[) . from which 77* is obtained. 
As for the loss function ([2]), for f2 arbitrary (O gives 

" ^ 22 fe Li2 + <° " °> " aJ2 ^ - (h " • < 26 > 

Again, it is easily seen that 

d 6 7 (r-l,r>) ar(l- 7 (r + l,n)) 

dn ^) = — F3I 55 (27) 

is monotone increasing, and thus there is a single minimizing value, f2*, which satisfies 

rr 2 a(i-7(r + i,n*)) 



r(r-l) 67(7- -1,0*) 



(28) 



Figure [5] shows, for the loss functions and estimators considered in Theorems [T] and 
[21 the degradation factor 77/77* as a function of a/6, with r as a paremeter. As is seen, 
for a/6 not too far from 1 the degradation factor is close to 1, that is, the considered 
estimators are nearly optimum. Furthermore, there is a value of a/6 for which each 
estimator is precisely optimum, i.e. minimax, as established by the following. 

Proposition 2. For each of the loss functions ([I]) and ([2]), there exists a unique value 
of the ratio a/b for which the estimator considered in Theorem^ or\^ respectively is 
minimax, that is, minimizes sup pg ( ^ n(p) over all (possibly randomized) estimators. 
For the loss function (TT|) this value is given by 

a _ l{r- l,r- 1) 

7 ~~ 1 7 i TV \ zy > 

6 1 — 7(7* — 1, r — 1) 
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(a) Linear-linear loss JTJ (b) Inverse-linear loss {2} 

Figure 3: Degradation factor fj/rj* for a = b as a function of r 

and /or (J2J it is determined by the condition 

7(r-l,fi) r(fi-r + l) 



l-7(r + l,fi) n(r-n) 



(30) 



wii/i Q as in (l20l 



3.3 Minimaxity for asymptotically large r in the case a = b 

The specific values of the ratio a/b determined by Proposition [5] can be shown to tend 
to 1 as r — > oo. Related to this, the following establishes that for a = b the proposed 
estimators are asymptotically minimax as r — > oo. 

Proposition 3. For the loss functions |T]) and {5} with a = b, each of the estima- 
tors considered in Theorems [7] and [H respectively, approaches a minimax estimator 
asymptotically as r — > oo, in the sense that linv^oo fj/r/* = 1. 

As a consequence of this result, for a = b and large r the considered estimators are 
approximately optimum in the minimax sense. This is illustrated in Figure [3J which 
shows the degradation factor fj/rj* as a function of r. In fact, fj/rf is seen to be very 
low even for small r, and in particular for the range of values of r that are commonly 
used in practice. Thus, for example, the mean absolute error (loss function ([I]) with 
a = b = 1) that is guaranteed not to be exceeded according to Theorem [T] is within 1% 
of the minimax mean absolute error for 7 < r < 1000. Similarly, defining risk as mean 
symmetric ratio minus 1 (loss function @ with a = b = 1), the risk that is guaranteed 
not to be exceeded as per Theorem[2]is within 0.1% of the minimax risk for the same 
range of values of r. 



4 Proofs 

For p G N, p > 2; /x > r — 1; and t G (0, 1), let Y p (p,, t) be defined as 

(m- i) { - p - x Hp- 1 {i-ty-p +1 



(P-1)I 



(31) 
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(x+d+3/2)' 




Figure 4: Illustration of ([Ml) 



In addition, the following well known relationship for the incomplete gamma function 
( Abramowitz fc Steeu nl. ll970l eq. (6.5.21)) will be used: 



l(t- l,tt) = l(t,u) + 



u l 1 exp(— u) 

W) ' 



(32) 



Proo f of Theorem [7J The result immediately stems from (I10|) and the analysis in Mendol 
( 200<fl . □ 



Lemma 1. The following inequality holds for r > 2. d G [— 1, 0], j > 2. 



iy > 



(r + 
J+l 



(33) 



Proof. The sum in (|33|) can be expressed as the area covered by the r — 1 rectangles 
of width 1 and height (i + d + l) 3 , i = 1, . . . ,r — 1 in Figure EJ or equivalently as the 
shaded area comprised by r — 2 unit-width trapezoids plus two half-width rectangles. 
Since the curve (x + d + 3/2) d touches the upper vertices of the trapezoids and is 
convex, the following inequality can be written: 



r— 1 

£(i + rf + l) 3 > f 
i=l •' 1 



(34) 



For j > 3 the term 1/2 — (d + 2)/(j + 1) in ([34]) is nonnegative, which assures that 
holds. For j = 2, reduces to 



V (i + d + 1) j> + i -2d 3 -6rf 2 + 6(r-2)d + 3r 2 -4 _ 
,3 6 



(35) 



i=l 



The second summand in (|35l) has a derivative with respect to d equal to — d 2 — 2d+r — 2, 
which is nonnegative for d G [— 1, 0], r > 2. Thus this summand is lower bounded by 
its value at d = —1, i.e. (3r 2 — 6r + 4)/6, which is positive. Therefore (|33j) also holds 
for j = 2. □ 

Lemma 2. Given p, \i, Q and S such that (i) p G N, p > 2; ('izj f2 G [p — 1 , p] ; (Hi) 
6 G [— p, 0]; (iv) ji > p — 1; and (v) p, > Cl — S — 1, the following hold: 
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(a) Y p (p, Q/(jU + 8 + 1)) is a strictly increasing function of p, with 

lim Y p (fi, + 5 + 1)) = flf- 1 exp(-Q)/(p - 1)!. (36) 



f&j Y p +i(/i + 1, ^/(/U + (5 + 1)) is a strictly increasing function of p, with 

lim Y p+1 (fj, + i,n/(u + 8 + i)) = n p cxp(-n)/ P \. (37) 

fj,— >oo 

Proof. According to hypotheses (iv) and (v) , it holds that p > p— 1 and Q/(p+8+l) < 
1, and thus Yp(/x, 0/ (// + 8 + 1)) and Y p +i(p + (p + 8 + 1)) are well defined from 

The proof will be carried out separately for parts (a) and |(b)| of the Lemma. 



(a) It is convenient to make the change of variable t = fl/(p + 8 + 1), by which 



Y p (p, fl/(p + 8 + 1)) is expressed as Y p (Q/t — 5—1, t). It will be shown that 

\imY p {Vt/t-8- l,t) = n?- 1 exp(-Q)/(p- 1)!, (38) 

which is equivalent to (f36|) ; and that Y p (fl/t — 8 — 1, t) is a strictly decreasing function 
of t, which will imply that Y p (p,fl/(p + 8 + 1)) strictly increases with p. From (l3~Tj) . 

hgYp(p,t) = ^l g^^ + ^log- + Gu-p+l)log(l-i), (39) 

i=l " i=l 

and thus 

■ort (« - 4 - M ) = £ io g (i - (i_^) + g,„ g « + (« _ , _ 4 ) ,„ g(1 - t) . 

(40) 

Taking into account that (p + 8)t/fl = (p + 8)/(p + 8 + 1) < 1 as a result of (iv), and 
that t < 1, the Taylor expansion log(l — t) = — £ J / J , \t\ < 1 can be used in (|40|) 

to yield 

log Y p (--6-l,t = -J>i*', (41) 



p-i 



CO = 



° + E lo g^ ( 42 ) 



p-i 



J + 1 j M' 



4^J2( i + 6 + i y fori^ 1 - ( 43 ) 



The equalities (gT]) and (j42J) imply d38J) , and thus (|36j) . 

To prove that Y p (Q,/t — 8 — 1, t) strictly decreases with i, it suffices to show that 
the coefficients Cj satisfy c, > 0, j > 1, with strict inequality for some j. For j > 2, 
Lemma Q] and (l43l) yield 

i(i+i) Cj > jn-(j+i)(p+8)+(p+s) = -i(p+*-n)+(p+*) ((^r)' - M 

(44) 
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Taking into account that p + S > by hypothesis (iii) , and using the inequality 
it follows from (gl]) that 



{ p + S- o) 2 
(j + 1)0 



Cj > > o. (46) 



For j = 1, (021) gives 



2 -2pO+(p-l)(p + 2) %-0-l) 

Cl = 20 + a ■ (47) 



Consider the first summand in f|47|) ■ The minimum of its numerator is attained at 
ft = p and equals p— 2. Thus, according to hypothesis (i), this summand is nonnegative. 
By (ii) and (iii), the second summand is also nonnegative; and therefore c\ > 0. 
Consequently Y p (Q/t — S — 1, t) strictly decreases with t, and thus Y p (p, Vt/(p + 6 + 1)) 
strictly increases with p. 

(b)| Making the same change of variable as in part (a) and taking into account 




= logY p+1 (--8 > t)=-J2c' j t j 



(48) 



i=l 



ft p + S 



p-1 
\ + 



c - - '— + —j > (i + S+iy > Cj >0 for j > 1. 

(50) 

The expressions f|^5|) — (f5T7|) establish the desired results. □ 

Proof of Theorem® As the case a = 0, b > is already covered by Theorem [TJ it will 
be assumed that a > 0. This implies, according to Proposition Q] that O > r — 1. 

The equality (|22|) is obtained substituting the loss function ([2]) into ([5|) with = 0, 
and making use of (fT8|) and (|32|) . 

For a > the inequality (fT9|) is strict, and can be expressed as 

bnY r (m,p) arY r+1 (m + l,p) fr \ 

^ P)< r-1 + O + °U J" (51) 

m=LO/p-dJ. (52) 

From ([52^1 it stems that m > r — 1. Each value of to has an associated interval 
-fm C (0, 1) such that (|5"2")) holds if and only if p £ I m . Namely, I m — (p\,p u ] with 
pi = 0/(to + d + 1), p u = 0/(to + d), except if to = r — 1, in which case d > O — r and 
thus pi < 1, p u > 1; or if to = r and <i = O — r, which gives p\ < 1, p u = 1; in cither 
case I m = (pi, 1). According to (fBTj) , and taking into account (p~8|) . to establish (|2Tj> it 
suffices to show that, for p € (0, 1) and to given by 



Y r (m,p) < O^ 1 exp(-0)/(r - 1)!, (53) 
Y r+1 (m + l,p) < O r exp(-0)/r!. (54) 
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If m = r — 1 the left-hand sides of (|53|) and (|54|) are zero, and the inequalities are 
clearly satisfied. Thus in the following it will be assumed that m > r. 

As a step in the proof of ([53| and (|54[) . it will be shown that for p <G (0, 1) and 
m > r related by (|52|) , or equivalently for to > r and p 6 I m , the following inequalities 
hold: 

Y(m ) < ! Y ^ m ^ r - 1 )/ m ) if(n-r + l)m<(r-l)(d + l), 

r{m,p) _ y Yr (m,n/(m + d + l)) if {CI - r + 1)to > (r - l)(d + 1). 

|y r+1 (TO + 1, r/(m + D) if (r - fi)m < ft - rd, 
+ v 7 ~ \Y r+1 {m + l,CL/{m + d)) if (r — Cl)m > CI — rd. 

For m > r, it follows from l|3ip that Y r {m,p) considered as a function of p £ (0, 1) is 
maximum at p max = (r — 1)/to < 1, monotone increasing for p < p max , and monotone 
decreasing for p > p max . As d < and f2 > r — 1, it is seen that p max < p u < 1, and 
that p max < p\ if and only if {CI — r + 1)to > (r — l)(d+ 1)- This implies that Y r {m,p) 
is bounded as given by ((55|) . Regarding ([56]) . the maximum of Y r+ i{m + l,p) with 
respect top 6 (0, 1) is attained at p' m ^ = r/{m+ 1) < 1. As to > r and d e [O — r, 0], 
it stems that p\ < p' max < 1, and that p u < pj„ax ^ an< ^ 0Iu y if ( r ~ Cl)m > CI — rd. 
This establishes ([56]) . 

The proof of (|53|) will be based on (|55|) . Since CI > r—1, the following definition can 
be made: n\ = (r— l)(d+l)/(f2 — r+1). The fact that d > f2 — r implies that /ii > r — 1. 
The upper condition in (|55p is equivalent to to < /ii , whereas the lower corresponds to 
to > /ii. As to cannot be smaller than r, the condition to < /ii can only be met for some 
to if /ii > r, i.e. if d > O— r+(0 — r+l)/(r— 1). On the other hand, the condition to > 
/ii can always be satisfied by taking to sufficiently large. Thus, (f53|) will be established 
in two steps. First, it will be shown that Y r {m, Cl/{m + d+ 1)) monotonically increases 
with to > fix and tends to Cl r ~ 1 exp{—Cl)/(r — 1)! as to — > oo. This will prove that 
(|53|) holds for all to > /ii. Second, it will be shown, for /ii > r, that 1^.(to, (r — 1)/to) 
monotonically increases with to > r and is smaller than fi r_1 exp(— Cl)/{r — 1)1 for 
m = Hi. This will establish (|53|) for all to such that r < m < Hi. Regarding the first 
case, to > hi, consider Lemma | ^a)| with values r, to, CI, d respectively for p, fi, CI, 
6. These values satisfy the hypotheses of the Lemma (it is obvious that (i)-(iv) hold, 
and (v) is satisfied as well because m > pi > r — l > — d — 1). According to this, 
Y r {m, CI/ {m+d+l) monotonically increases with to and tends to Cl r ~ 1 exp(— Cl)/{r— 1)! 
as to — > oo. Therefore f|53|) holds for to > F° r the case r < m < Hi, Hi ^ r i using 
Lemma ta) | (with values r, to, r — 1, —1 respectively for p, p, CI, 5; (v) holds because 
to > r — 1) it is seen that Y r (m, (r — 1)/to) increases with to. The definition of pi 
implies that (r — l)/pi = Cl/{pi + d + 1), and thus 

Y r {pi,{r-l)/pi)=Y r {pi,Cl/{pi + d+l)). (57) 



Applying Lemma QIa) | again (with values r, p±, CI, d; (v) is satisfied because pi > r > 
CI — d > CI — d — l)to the right-hand side of this equality shows that (|57|) is smaller 
than Cl r ~ 1 exp{—Cl)/{r — 1)!. Therefore (|53| holds for r < to < pi. 

As for ([54ll . it is seen that the lower condition in (|56[) is not met for any to if Ct = r, 
whereas ii CI < r there exist values of to which satisfy each of the conditions. These 
two cases will be treated separately. In the case CI < r, the proof proceeds along the 
same lines as that of (f53|) . Let p2 — {Ct — rd)/(r — Ct). The facts that d < and 
Ct > r — 1 imply that pi > r — 1. In addition, the upper condition in (|54[) can only be 
met if /X2 > Thus it suffices to show first that Y r+ i{m+ 1, Cl/{m + d)) monotonically 
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increases with to > \ii and tends to il r exp(— Q)/r\ as m — > oo; and second that, if 
> r , Y r+ i(m + 1, r/(m + 1) monotonically increases with m > r and is smaller than 
fi r exp(— £1) /r! for m = [ii . The first part directly stems from Lemma^b)] (with values 
r, m, (l, d~ 1; (v) holds because to > /12 = (£1 — rd)/(r — fi) = £1(1 — d)/(r — £1) — d > 
£1 — d). Regarding the second, the increasing character of Y r+ i(m + 1, r/(m + 1) 
with to is also established by Lemma [ ^b)| (with values r, to, r, 0; (v) holds because 
m > r > r — 1 ) . The definition of /12 implies that r/(/i2 + l) = £1/ (/12 + d) , from which 

y r+ i( M2 + 1, r/(/i 2 + 1)) = *h-i(/*2 + 1, n/(/ia + d)), (58) 



and applying Lemma ^[b)\ (with values r, /i 2 , fi, d — 1; (v) holds because, as argued 
previously, /12 > & — d) to the right-hand side of (|58p establishes that it is smaller 
than £l r exp(— £1)/H. In the case £1 = r, the expression (|56[) reduces to its upper part, 
and (|54p follows from Lemma | ^b) | (r, to, r, 0; (v) holds because m > r > r — 1). This 
completes the proof. □ 

Proof of Proposition^ For L as in (p}, equating d(lim p ^o r](p))/dQ given by (|24|) to 
0, solving for a/b and particularizing to O = r — 1 yields ((29]). 
As for F given by ([2]), from (|18|l it is seen that 



a (fi-r + l)fi 



(59) 



& (r-l)(r-f2) 

Setting Q* = Q in jH} and combining with ([59]) yields (|30]l. □ 
Lemma 3. For any fc G N, i/ie factorial fc! satisfies the following: 

fc! > V2^k k+1/2 exp(-fc) (60) 

Proof. These expressions follow from ( Abramowitz fc Stegun . 1970l eq. (6.1.38)). □ 

Lemma 4. For any sequence of numbers 5k such that < 5k < 1, limfc^oo 7(fc, fc + 
<Jfc) = 1/2. 

Proof. According to lAdell fc Jodri (|2005l lemma 1), lim^oo j(k, fc) = 1/2. From 

ia), 

7(fc, fc + 1) - 7(fc + 1, fc + 1) = (fc + l) fe cxp(-fc - l)/fc!. (62) 

As a result of Lemma [3j the right-hand side of (1621) tends to as fc — > 00, and 
therefore lim/^oo 7(fc, fc+1) = lim^oo 7(fc, fc) = 1/2. The fact that 7(4, u) is monotone 
increasing in u implies that j(k,k) < 7(fc, fc + 5k) < 7(fc, k + 1), and the desired result 
follows. □ 

Lemma 5. For r > 2 and a — b, the solution £1* to (j25|) Hes in (r — 4/3, r — 2 + log 2), 
and lim r ^ 00 (£l* - r) = -4/3. 

Froo/. The result follows from|Alm| (|2Q03l ). □ 
Lemma 6. For r > 2 and a — b, the solution £1* to Ij28|) lies in (r — l,r). 
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Proof. Using (|32[) the condition (f28f can be written, for a = b, as 



"• 2 = f.-g, (63) 



r(r — 1) y ' r! \ ?* — 1 

Let t>i(ft*) and ^(ft*) respectively denote the left-hand and right-hand sides of (|63|) , 
considered as functions of ft*. It is easily seen that v\ is monotone increasing, whereas 
V2 is monotone decreasing on the interval (r — 1, r). From Lemma [5] it follows that 
7(r, r — 1) < 1/2, which implies that Vi(r — 1) < 1. On the other hand, V2(r — 1) = 1. 
Therefore the solution to (|63|) . or equivalently to (|28|) . satisfies 17* > r — 1. By 
analogous arguments it is seen that Vi(r) > 1 and V2(r) < 1. Therefore the solution 
satisfies ft* < r. □ 

Lemma 7. For any 61,62 € K, tte sequence of functions hk(6) = exp(o")(l + 6/(k — 
l)) _fc+1 , k E N. k > 2, 6 £ [61,62] converges uniformly to 1 as k — > 00. 

Proof Let k = max{-5i, 0} + 1. As (1 + 5/(k - 1)) > for 5 > 6±, k > k Q , taking 
logarithms in the definition of hk{6) gives \oghk{6) = 6 — (k — 1) log(l + 5/(k — 1)). 
Replacing k by a continuous variable x > 1 and using the inequality log(l + 1) > 
i/(l + t), it is seen that 



d_ 

dx 



6 - (x - 1) log 1 + 



x- 1 



log ( 1 + ^-T + ^T^T <0 - (64) 



This implies that hk+i(6) < hk{5) for k > k . In addition, hk(6), 6 £ [61, 6 2 ] is a 
continuous fun ction and converges pointwise to 1 as k — > 00. Thus Dini's theorem 
( Aposto 1 I1974L p. 248) can be applied, which assures that the convergence is uniform. 

□ 

Proof of Proposition^ For L as in ([TJ, particularizing (|23|) toa = 6, f2 = r — 1 and 
using (021), 

V o( ( i u ( i\\ 2(r-l)'- 2 cxp(-r + l) 

- = 2 7 (r- l,r - 1 -7 (r,r - 1) = — . (65) 

a (r — 2)! 



In the following, the value f2* determined by (|25l) for a given r will be denoted as f2* 
For a = 6, ft = ft;, using (|23j), ([25]) and (02]), 



£ = 2n; 7 (r-i,n;) _ + x _ = 2 nr- 1 cxp(-n;) 

a r — 1 ' r — 1 (r — 1)! 

From (p5)) and (|66l) . with /ifc(<5) as defined in Lemma [71 it follows that 

/ _ 1 n r-l 

V - / ' ] exp(ft; - r + 1) = h r {6* r ) (67) 



77* V 

with <5,* = ft*— r+l. Lemma[S]establishes that 6* e [— 1/3, -l+log2] and limr^oo 6* = 
— 1/3. On the other hand, by Lemm a [TJ hk(5 * ) — > 1 uniformly on [—1/3, —1 + log 2] 
as k — > 00. Therefore, according to lApostol ( 1974 theorem 9.16), limfc 1 /-^ o hk{5*) 
exists and equals 1. Thus, in particular, linv^oo h r {6*) = 1, which combined with 
(|67]) establishes that linv-^oo fj/rf = 1. 
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For L as in ((2J), and with Q* given by ([28]). let fi* and 5* be defined as before. In 
addition, let (l r denote the value of ft corresponding to a given r, and 8 r = ft r — r + 1. 
Particularizing ([26)) to a = b, ft = il r and using ([32]) gives 



(68) 



I ^ o7 ; " - 1 A) - 7(r + ! A)) (^T ' ^ + i 

= exp(-O r ) / g-\ fjK_ _ r\ fi ^ _ 
(r-1)! ^ r y fi ry | n ° n r 

Thus 77 can be written as a^o + #1 + 6*2) with 

2 (r - 1 + SrY' 1 exp(-r + 1 - 8 r ) / l-_ r \ 

"° = F 7 !)! I ~J' 

5 r 



(69) 



t + 2. r - 1 

7(r-l.r-l + _ r ), (70) 



r - 1 + 5 r 

1 - <L 



r-l + S r 
The quotient 82/60 is computed as 

6> 2 (1 - 8 r ) exp(8 r ) (r- l)!exp(r- 1) 



A) 



1 + A r Y( 2 -i=L) [r-iy-v* 



r-l 



(71) 



(72) 



Proposition Q] implies that S r £ (0, 1). Taking into account that r > 2, it is seen that 
the first factor in (|72[) lies in a bounded interval for all r, whereas, by the equality in 
Lemma |3l the second factor tends to ^/2w as r — > 00. As a result, lim.r_j.oo 62/60 = 0. 
Similarly, 81/60 is expressed as 

^fj + 2S r - lj exp(S r ) ( r _i)! ex p(r-l) . i i - 1 

7(r-l,r-l + <y r )--==. (73) 



(i .-)'(> — ) c-D r - 1/2 v ' ; vf^-i 

As before, the first factor in the right-hand side of ([75)1 is bounded, and the second 
tends to V_7T- The third factor tends to 1/2 by Lemma 2J Thus lim.r_j.oo 6*1/6*0 = 0. 

The quotient 77* /a is given as in (l68[) with f2 r replaced by and 77* = a(_g +6>* + 
#2), where 6>q, 6 1 * and 0| are obtained from (J6QJ) — ((ZSI) with 8 r replaced by 8*. Lemma [6] 
implies that 8* £ (0, 1), and arguments analogous to those in the preceding paragraph 
show that 6>*/6'o and 8\ /6q tend to as r — > 00. As a result, limr._j.o_ 77/77* can be 
computed as 



i-s r 



fj 6 ( 1 + 7^r) cxp(-J. r ) 2 
lim -!- = Hm/ = lim ^ '- — , rV- 74 

It\ a ^i_x*\ 2 — ^ 



(l + ^r) exp(-<5*) 



Since 8 r ,8* £ (0, 1) for all r, it is clear that the second factor in the rightmost part of 
([74)) tends to 1 as r — > 00. By Lcmma[7l (l + 8/(r — l)) r_1 cxp(— S) — > 1 uniformly for 
8 £ (0, 1). This implies that the numerator and denominator of the first factor in ([74| 
tend to 1 as r — > 00 (note that 8 r and 8* are not required to converge). Consequently 
lirrir._j.oo fj/rj* = 1. □ 
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